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Abstract 

Background: Comparative studies of amniotes have been hindered by a dearth of reptilian molecular sequences. 
With the genomic assembly of the green anole, Anolis carolinensis available, non-avian reptilian genes can now be 
compared to mammalian, avian, and amphibian homologs. Furthermore, with more than 350 extant species in the 
genus Anolis, anoles are an unparalleled example of tetrapod genetic diversity and divergence. As an important 
ecological, genetic and now genomic reference, it is imperative to develop a standardized Anolis gene 
nomenclature alongside associated vocabularies and other useful metrics. 

Results: Here we report the formation of the Anolis Gene Nomenclature Committee (AGNC) and propose a 
standardized evolutionary characterization code that will help researchers to define gene orthology and paralogy 
with tetrapod homologs, provide a system for naming novel genes in Anolis and other reptiles, furnish 
abbreviations to facilitate comparative studies among the Anolis species and related iguanid squamates, and 
classify the geographical origins of Anolis subpopulations. 

Conclusions: This report has been generated in close consultation with members of the Anolis and genomic 
research communities, and using public database resources including NCBI and Ensembl. Updates will continue to 
be regularly posted to new research community websites such as lizardbase. We anticipate that this standardized 
gene nomenclature will facilitate the accessibility of reptilian sequences for comparative studies among tetrapods 
and will further serve as a template for other communities in their sequencing and annotation initiatives. 



Background 

As the rate of generating new sequence assemblies con- 
tinues to accelerate, the final bottleneck that remains is 
annotation. While automated pipelines have been devel- 
oped, it is still up to community initiatives to pool, eval- 
uate, integrate, and disseminate the necessary resources 
required for functional and comparative annotations 
that support research needs. The presence of multiple 
tools and resources, and changing assemblies and anno- 
tations, presents "moving-target" challenges for those 
attempting to assign function, orthology, nomenclature 
and other common vocabulary to genetic loci. One 
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challenge is that many assemblies are, or will be, peri- 
odically updated due to resequencing efforts that aim to 
fill in ever-present gaps, initiatives to provide a consen- 
sus reference sequence that takes into account the poly- 
morphism present in a species, or a re-deployment of 
different assembly algorithms. The second challenge is 
that the generation of confidently assigned gene models 
on a fixed assembly generally correlates with the 
amount of effort that a community puts into annotating 
their genome of interest. A third challenge relates to the 
principle that orthologous (and by association, func- 
tional) assignments are interdependent on the quality 
and quantity of annotations from closely related 
genomes. 

The recent publication of the genome sequence of 
the green anole, Anolis carolinensis, offers a rich trove 
of opportunities for biologists [1]. Comparing verte- 
brate genomes holds the promise to solve such 
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questions as unmasking the genetic basis of human 
disease in addition to understanding common evolu- 
tionary processes. Whole genome sequencing efforts in 
vertebrates have been carried out for 39 species of 
mammals (10 primates, 8 rodents, 12 laurasiatherians, 
3 afrotherians, 2 xenarthrans, 3 marsupials, 1 mono- 
treme), 3 birds (avian reptiles), 1 amphibian, and 5 tel- 
eost species [2,3]. Non-avian reptiles are missing from 
this taxonomic survey of genomes, and the publication 
of a whole genome assembly for the green anole helps 
to fill this gap [1]. As a complement to this effort, a 
growing number of online resources are available for 
the Anolis community (Table 1). 

Mammals, birds, and non-avian reptiles are grouped 
as amniotes, due to shared features including a charac- 
teristic egg adapted to terrestrial reproduction. Within 
the amniotes, mammals are estimated to have diverged 
over 300 million years ago (mya) from the reptiles [4]. 
Within the Reptilia are three major lineages: the Archo- 
sauria, which contains crocodilians, dinosaurs and birds 
and whose most recent common ancestor lived approxi- 
mately 250 mya; the Lepidosauria, which contains the 
Squamata (lizards and snakes) and the tuatara (a lizard- 
like reptile found only in New Zealand); and the Ana- 
psida or turtles. For comparative genomic analysis, this 
first non-avian reptile sequence will be invaluable as an 
outgroup for comparative analyses of an increasing 
number of amniote sequences. 

For the past century, A. carolinensis, which is native to 
the southeastern US, has been a lizard of choice for 
comparative studies in ecology, evolutionary biology, 
behavior, physiology and neuroscience. With genomic 
and transcriptomic sequences available, A. carolinensis is 
also emerging as an important model organism for cel- 
lular, molecular, developmental and regenerative studies. 
Furthermore, A. carolinensis is only one of over 350 



described species of Anolis, making it a member of one 
of the most species-rich clades of tetrapods [4]. 

Comparative genomic research at all taxonomic levels 
would be facilitated by a consistent system of gene 
nomenclature for A. carolinensis as the first sequenced 
non-avian reptile. Towards this goal, members of the 
Anolis research community have established the Anolis 
Gene Nomenclature Committee (AGNC) to generate 
and maintain standardized gene vocabularies. As a com- 
panion to the publication of the first non-avian reptile 
genome, we present this report as the first step in an 
evolving document. 

Report and Discussion 

Establishing evolutionary metrics to help evaluate 
orthology between anoles and other vertebrates 

As an approach in the annotation process, finding 
orthologous relationships across species has become an 
important tool to evaluate gene identity [5]. However, 
determining gene orthology is not a trivial exercise. Ver- 
tebrate genomes have experienced a dynamic flux of 
activity from countless deletions and duplications, a 
constant stream of genomic rearrangements (including 
at least two whole genome duplications), and divergence 
in both gene expression and protein function. Fortu- 
nately, for many genes, orthologs can be reliably deter- 
mined based on reciprocal protein similarity. For other 
genes, divergence in sequence requires data from syn- 
teny (gene order) conservation and functional analysis 
to also be considered. Below, we present the challenges 
involved in maintaining an evolving and community- 
accepted record of gene ancestry, and briefly review the 
current state of assigning orthology using presently 
available resources and tools. Proposed criteria for eval- 
uating gene orthology and paralogy are offered below 
with an aim to present a multi-metric summary for each 



Table 1 Anolis online databases and resources 


db Name 


Resources/Tools Available 


URL 


Anole Annals 


• Blog updated regularly and focused on the latest Anolis 
research 


http://www.anoleannals.wordpress.com 


Anolis Genome 


• Anolis genomic and expression data 


http://www.anolisgenome.org 


Anolis Genome 
Project 


• Primary site for genome sequencing effort by the Broad 
Institute 


http://www.broadinstitute.org/models/anole 


Anolis Newsletter 


• Manuscripts and reports generated by the Anolis community 


http://anolis.oeb.harvard.edu 


Ensembl 


• Anolis carolinensis portal, genome and annotations 


http://www.ensembl.org/Anolis_carolinensis/lnfo/lndex 


lizardbase 


■ Anolis genome browser 

• GIS data mapping 

• Gene nomenclature resources 

■ Anolis educational materials 


http://www.lizardbase.org 


NCBI Unigene 


■ Anolis carolinensis transcripts 


http://www.ncbi.nlm.nih.gov/UniGene/UGOrg.cgi? 
TAXID=28377 


UCSC 


• Anolis carolinensis portal 
■ Comparative genomic tracks 


http://www.genome.ucsc.edu/cgi-bin/hgGateway 
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gene that offers a measure of the confidence with which 
the investigator can assign orthology. 
Resources and challenges for assigning orthology 

Confidence in genome assembly High quality whole 
genome assemblies are essential for confidence in 
comparative analysis. The genome of A. carolinensis 
(estimated to be 1.78 Gbp) was first assembled in 
March 2007 via shotgun reads to a depth of 6.85X 
(AnoCarl.O) [1]. The second iteration of genome 
assembly (AnoCar2.0) was released in May 2010 and 
included increased coverage (7.10X). The Anocar2.0 
assembly incorporated 6,645 scaffolds comprised of 
41,985 contigs with a supercontig N50 of 4.0 Mbp. 
Scaffolds were anchored to chromosomes by FISH 
mapping using 405 BACs. Increased genome coverage 
from new sequencing efforts is anticipated in the 
upcoming years. Improved assemblies will allow for 
conserved syntenic blocks to be more easily recog- 
nized thereby greatly assisting in identifying orthologs 
with confidence. 

Confidence in gene models Our inference of gene 
orthology depends on the quality of gene annotations 
among the multiple species compared. Awaiting large 
public genome databases such as EMBL-EBI/Sanger's 
Ensembl and NCBI's UniGene to generate gene models 
and clusters provides a trouble-free route to reliable 
annotations; however, the lag time from assembly 
release to initiating an annotation build currently 
remains at least four months and can take over an 
entire year to become publicly available. Presently, 
Ensembl generates a fairly quick and reliable gene build 
that is based on a combination of ab initio gene predic- 
tions, comparative genomics, and incorporation of 
experimental (e.g., ESTs) resources (doi:10.1101/ 
gr.1858004). Ensembl GeneBuild58.1b dramatically 
increased the number of genes annotated in A. caroli- 
nensis from a pre-genome list of 36 loci to a genome- 
wide set (based on AnoCarl.O) of 11,932 loci. Of these 
initial annotations, 4,793 new genes were discovered 
along with 471 pseudogenes and 3,099 RNA genes com- 
prising a total count of 20,885 transcripts. In contrast, 
UniGene clusters ESTs and mRNAs: as a result Uni- 
Gene Build version 2 described 26,575 transcript clus- 
ters. So, how do we compare the quality of each of 
these annotation sets? An interesting feature used by 
some model organism databases is the application of 
confidence scores. In FlyBase [6] a single digit scoring 
metric is assigned based on evaluating three different 
classes of evidence: ab initio gene prediction algorithms, 
aligned nucleotide sequences and overlapping regions of 
protein similarity. FlyBase plans to refine their transcript 
confidence to include support from comparative geno- 
mics, proteomic analyses, and to potentially provide 
details on the magnitude and quality of each type of 



support. Comparable approaches are planned to be 
developed for A. carolinensis (see below). 
Confidence in aligned assemblies from nearby taxa 
The paucity of amphibian and reptilian sequences com- 
pared with mammalian genomes presents a challenge 
for comparative analysis. When entire vertebrate clades 
depend on the annotations of a single genome, errors in 
comparative analysis are likely. As more annotated 
assemblies become available, we should be able to test 
and refine current assignments of orthologous and para- 
logous relationships. Yet, not all annotations are created 
equally, with model organisms such as chicken, mouse, 
rat and zebrafish having more comprehensive annota- 
tions due to greater allocated resources and larger active 
research communities. Therefore, the challenge is to 
develop an annotation approach that keeps pace with 
the rapidly expanding number of whole genome 
sequences being produced. 

Currently available orthology pipelines Ancestral rela- 
tionships between loci from selected species can be 
extracted via a variety of ready-built pipelines. The 
major databanks provide orthology/paralogy relation- 
ships for completed genomes through the implementa- 
tion of well-established data workflows. Ensembl's 
orthology and paralogy relationships are based on a 
maximum likelihood tree-building algorithm, TreeBeST 
[7]. NCBI's Homologene uses a clustering approach 
based on an initial blastp search [8]. The UCSC Genome 
Browser also generates a comparative genomic table on 
selected sequenced species [9,10]. A number of other 
databases that specifically identify orthology/homology 
include the Orthologs Matrix Project (OMA) [11,12], 
InParanoid [13,14], TreeFam [15,16], Optic [17,18], and 
Evola [19,20]. Interestingly, HUGO (Human Genome 
Organization) has constructed a meta-comparison tool, 
HCOP (Human Gene Nomenclature Committee Com- 
parison of Orthology Predictions), that records whether 
an orthology call has evidence in each of the before- 
mentioned pipelines, hence, providing a valuable evalua- 
tive resource to assess overall confidence [21]. A major 
challenge for bioinformatics research is to keep up with 
an ever-changing landscape of software tools. Workflow 
evaluations must be performed on a regular basis by 
computer-sawy researchers but, most importantly, the 
results must be validated by knowledgeable biologists. 
Towards community-driven evaluations of orthology 
With an accelerated increase in genomic sequence data, 
even a well-organized mechanism to assign orthology 
can be overwhelmed. A community-driven effort to 
characterize a gene's evolutionary history as well as our 
confidence in summarizing it will be useful to the com- 
munity and beyond. We propose that the Anolis 
research community work together to initiate and ulti- 
mately complement these efforts to build a pipeline that 
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follows a common set of guidelines and relationships 
with the large genomic databanks. Towards this end, the 
AGNC has established working relationships with repre- 
sentatives from a network of relevant databases. 

Developing a common set of guidelines is the major 
focus of the AGNC in the upcoming year. Ultimately, 
we aim to generate a weighted point system, considering 
the different types of characteristics being compared. In 
situations where there is still substantial ambiguity, the 
AGNC plans to work with the researchers and database 
community for preliminary assignments. In the interim, 
we propose the following framework as a starting point: 
Species/taxa for comparative analysis Multiple align- 
ment programs such as ClustalW [22], MUSCLE [23] 
and T-COFFEE [24] provide accessible tools to align 
multiple species. The presence or absence of reliable 
alignments can tell us which lineage this gene is limited 
to. All comparative analyses should include a common 
starting set of genomes to align to: 

♦ Mammals: 2 eutherians, preferably mouse and 
human, plus marsupial and monotreme genes if 
available. 

♦ Birds (avian reptiles): zebra finch and chicken 

♦ Non-avian reptiles: Any additional gene sequences 
as available, particularly for non-squamate species 
(turtles or crocodilians) 

♦ Amphibians: Xenopus tropicalis and additional 
genomes as available 

♦ Teleosts: Zebrafish and Fugu rubripes or Tetraodon 
nigroviridis should be included. Additional teleosts 
(stickleback, medaka) can also be analyzed. 

♦ Non-vertebrate chordates: Either Ciona intestinalis 
or savignyi can serve as a stem alternative to Droso- 
phila melanogaster, if available. 

Protein sequence analysis Sequence analysis programs 
such as MEGA [25] and PAML [26] provide accessible 
tools to analyze protein alignment across multiple spe- 
cies. Protein divergence will be estimated using dN 
(amino acid divergence) and dS (silent site divergence) 
using a codon-substitution matrix. There will be much 
variation in divergence estimates across proteins; how- 
ever, confidence in alignment can be evaluated by com- 
paring these estimates to other proteins. In particular, 
dS will serve as a neutral divergence marker among ver- 
tebrates while dN will provide a rough indicator of 
sequence alignment quality across larger phylogenetic 
distances. 

Orthology/Paralogy relationships Using the align- 
ments, it will be informative to extract copy number 
information for each gene. A number of databases also 
provide this information (e.g., Ensembl) in their orthol- 
ogy pipelines. Relationships such as 1:1, l:n, n:n (where 



n is an integer) are instructive to users interested in 
gene families and how they evolved between lizards and 
a reference genome such as chicken. 
Predicted transcript sequence analysis Building on an 
approach used by FlyBase [6], each transcript receives a 
score based on a single-digit octal notation and the sum 
of the following categories (to an 8 point maximum): 

♦ 1 point if one or more aligned EST sequences 
aligns to the annotated transcript, 

♦ 2 points if an annotated exon intersects a region of 
aligned protein similarity (of course, similarity to self 
is excluded), 

♦ 4 points if there is any gene prediction that is fully 
consistent with the annotated transcript, and 

♦ 8 points if one or more aligned cDNAs are fully 
consistent with the annotated transcript. 

Experimentally defined transcript sequence and alter- 
native splicing EST or full-length cDNA transcript 
sequence is highly preferable to predicted annotations 
and should be used at every opportunity. Suggested 
parameters are currently as defined above. For alterna- 
tive splicing, the identification of similar patterns of 
alternative splicing in the species being compared 
greatly increases confidence that there is an orthologous 
relationship. 

Synteny conservation Minimally, orthology could be 
recognized by the presence of at least 2 orthologous 
genes, from Gallus gallus, on either the 5' or 3' flanking 
sequences and in sequential order. Confidence increases 
with additional orthologous genes on one flank, or syn- 
teny conservation on both flanking regions. 
Gene expression Following gene duplication events, 
divergence of regulatory control regions can lead to dif- 
ferentiation in tissue specificity and timing of gene 
expression in paralogous genes. These regulatory regions 
are considered part of the gene being compared, but it 
is not straightforward to assign a score to this diver- 
gence. Genes that appear to be orthologous by the mea- 
sures above can still display strikingly different gene 
expression, raising the question of whether the regula- 
tory gene functionality has diverged in an opposing fash- 
ion to that of the protein coding sequence. This is one 
of the most difficult comparisons to evaluate, and as 
more comparative analyses are reported, the AGNC 
aims to develop proposals regarding how genes should 
be annotated when sequence and expression suggest 
contradictory findings about the descent of gene 
functionality. 

Much of the above information can be collated into a 
single colon-separated string that provides the AGNC 
with a single metric to evaluate nomenclature, and the 
user with an instant confidence metric. Since this 
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evolutionary character code (ECC) would change 
depending on the input data, the metric would simply 
be linked to the gene as a separate feature. As an exam- 
ple, a hypothetical "gene2" would be annotated with the 
gene description, gene2:chordates:80, 55:1-1:5:3, 4:TS, 
meaning that gene2 has orthology only within chordates 
with, respectively, 80% and 55% overall protein and 
nucleotide identity (alternatively, dN and dS can be 
used), it doesn't possess paralogs within and between 
species (chicken), it has both gene prediction and EST 
evidence (an octal score of 5), 3 genes upstream with 
synteny conservation with the reference species and 4 
genes downstream, and tissue-specific expression in a 
cross-species comparison (e.g., with mouse). 

With the adoption of a reliable set of orthologous 
relationships, downstream functional and comparative 
annotations and alignments that can be used by the 
entire community could quickly be generated. As an 
example, gene ontologies (GO) can be easily transferred 
after orthologies are assigned. Since the chicken genome 
is one of the twelve "reference" genomes that the Gene 
Ontology database is carefully annotating with con- 
trolled ontological vocabulary [27], the A. carolinensis 
genome is in an excellent position to be annotated reli- 
ably with associated GO terms. 

These data must be quickly disseminated to the com- 
munity via regularly updated databases. The Anolis com- 
munity currently has a database that is preparing for the 
next generation of data sets, lizardbase [28] is the pri- 
mary community website and anole resource that 
includes a mapping portal for both geographical and 
genome-based data. It is critical that such community- 
serving databases coordinate the effort to provide con- 
sensus datasets. 

Nomenclature for Anolis gene names and symbols 

Analysis of the chicken and zebra finch genomes has 
demonstrated that while a majority of genes can be 
assigned clear orthologs, functional genes unique to the 
avian lineage require additional analysis [29]. With the 
A. carolinensis genome, the challenge is for gene 
nomenclature to both clearly point out orthology with 
other vertebrates and allow for identification of non- 
avian, reptile-specific genes. The AGNC has reviewed 
guidelines issued by gene nomenclature organizations 
from mammalian (Human Gene Nomenclature Com- 
mittee, HUGO; International Committee on Standar- 
dized Gene Nomenclature for Mice), avian reptile 
(Chicken Gene Nomenclature Committee) [30], amphi- 
bian (Xenbase) [31,32], and teleost (ZFIN, Zebrafish 
Information Network) [33,34] communities. 

A major consideration for gene nomenclature in A. 
carolinensis is flexibility for comparisons with other 
amniote genomes. Given that the most frequent 



comparisons of Anolis genes would likely be with 
human, mouse, or chicken orthologs, the AGNC pro- 
poses using a gene symbol style that would allow the 
reader to infer the species based on the symbol alone. 
For a hypothetical gene named "gene2", likely species 
for cross-comparison are: 

GENE2, human (Homo sapiens): all capitals, 
italicized 

Gene2, mouse (Mus musculus): first letter capita- 
lized, italicized 

GENE2, chicken (Gallus gallus): all capitals, italicized 
gene2, Xenopus tropicalis: all lower case, italicized 
gene2, zebrafish (Danio rerio): all lower case, 
italicized 

To make it easier to distinguish a reference to an Ano- 
lis gene in comparisons with human, mouse, and avian 
orthologs, the AGNC proposes a gene symbol style simi- 
lar to Xenopus tropicalis and zebrafish, i.e., 

gene2, Anolis carolinensis: all lower case, italicized 

Further details of these guidelines are presented 
below. 

Gene symbols 

• Gene symbols for all Anolis species should be writ- 
ten in lower case only and in italics, e.g., gene2. 

• Whenever criteria for orthology have been met 
(previous Section), the Anolis gene symbol should be 
comparable to the human gene symbol, e.g., if the 
human gene symbol is GENE2, then the Anolis gene 
symbol would be gene2. In situations where the 
human and mouse symbols differ, the AGNC 
requests that the investigator contact the AGNC 
through lizardbase to determine a suitable gene 
symbol for Anolis. 

• Orthologous genes in other Anolis species should 
have the same gene symbol and name as A. caroli- 
nensis. A proposed abbreviation code system for 
comparisons within the genus covering Anolis spe- 
cies is presented below (see section below; Table 1). 

• Gene symbols should only contain ASCII charac- 
ters (Latin alphabet, Arabic numerals) 

• Punctuation (dashes, periods, slashes) should not 
be used unless they are part of a human or mouse 
gene symbol, e.g., if the human gene symbol is 
NKX3-1, then the Anolis gene symbol should be 
nkx3-l. 

• Gene names: In other model systems, a unique 
database of gene symbols is typically maintained by 
a gene nomenclature committee, but there is more 
variability for the full gene name. Whenever possible, 
the human or mouse gene name should be used, but 
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omitting references to homology or disease descrip- 
tions, e.g., "delta-like 1", not "delta-like 1 (Droso- 
phila)". Provisional human or mouse gene names, e. 
g., KIAA# or C#orf, should not be used as the basis 
for a gene name in Anolis species. 

♦ Novel gene names and symbols: If an orthologous 
gene cannot be identified in any currently sequenced 
genome, a novel name may be selected by the inves- 
tigators. The name should ideally be brief and con- 
vey information about the gene expression or 
function but not include proper or commercial 
names, e.g., yepl, yolk expressed protein 1. Refer- 
ences to molecular weight should be avoided, i.e., do 
not use p35, 35 kDal protein. 

♦ Gene symbols should not start with an "A" or "Ac" 
as an abbreviation for Anolis carolinensis, i.e., not 
acgene2. Gene symbols may start with "a" or "ac" if 
the human or mouse ortholog starts with these let- 
ters, e.g., actb for beta-actin. 

♦ Using criteria for orthology described in the pre- 
vious objective, duplication of the Anolis ortholog of 
a mammalian gene will be indicated by an "a" or "b" 
suffix, e.g., gene2a and gene2b. If the mammalian 
gene symbol already contains a suffix letter, then 
there would be a second letter added, e.g., gene4aa 
and gene4ab. 

Protein symbols 

♦ Protein symbols should be the same as the gene 
symbol except written in all upper case without ita- 
lics, e.g., GENE2. 

Nomenclature for Anolis non-coding sequences, including 
transposons and repetitive elements 

The classification and nomenclature of transposable ele- 
ments presents a particular challenge because of the 
large diversity of transposons in eukaryotic genomes. 
Several classification and naming schemes have been 
proposed but there is currently no consensus on how 
transposons should be annotated [35,36]. An ideal clas- 
sification system of transposable elements should reflect 
the evolutionary relationships among elements [37]. 
However, as eukaryotic genomes are annotated indepen- 
dently from each other there has been a tendency to 
name transposon families by numbering them in the 
order they are discovered, without much consideration 
of their evolutionary affinities across genomes [38]. 
Although scientists agree on the major categories of 
transposable elements (DNA transposons, non-LTR ret- 
rotransposons and LTR retrotransposons), there is no 
consensus on their classification at lower levels (families 
and subfamilies) and on how to name newly discovered 
transposons. Thus, the nomenclature of transposons can 



be considered a work in progress. An International 
Committee on the Classification of Transposable Ele- 
ments has been created and is aiming to build a classifi- 
cation that will reflect the structural and evolutionary 
affinities among elements, yet that will also be relatively 
easy to use. Until a consensus is reached within the 
transposable element community, we propose some sim- 
ple guidelines for the nomenclature of transposable ele- 
ments in A. carolinensis. 

The general principles of the nomenclature follow the 
recommendations of Kapitonov and Jurka [37], with 
some minor modifications. Kapitonov and Jurka pro- 
posed to name elements by the super-family in which 
they belong, followed by a unique identifier (generally a 
number), a structural identifier if necessary, and end 
with a species identifier. For example, Helitron-l_Acar 
would be the name of family 1 of autonomous Helitron 
in A. carolinensis. If a non-autonomous family of heli- 
tron has been amplified by Helitron- l_Acar, its name 
will be Helitron- IN l_Acar, the N indicating its non- 
autonomous nature. However, the diversity within some 
super-families is relatively well known, at least in verte- 
brates, and we propose that the name of elements 
should reflect their evolutionary affinities below the 
super-family level. For instance, the hAT super-family 
contains several well-defined monophyletic lineages (e.g., 
hobo, Charlie, restless). In those cases where the diver- 
sity of the super-family is well characterized, we propose 
to name elements using the name of the clades. For 
instance, we propose to use the name hobo-l_Acar 
instead of hAT-l_Acar for a family that is unambigu- 
ously related to other hobo elements. 

An additional difficulty in naming transposable ele- 
ments results from the common occurrence of horizon- 
tal transfer. A consequence of horizontal transfer is that 
identical or very similar elements might be found in dis- 
tantly related organisms [39-42]. Novick et al. [41] pro- 
posed to use the letter HT to indicate the fact that an 
element has been horizontally transferred from another 
species, e.g. hAT-HTl_Acar. However, this solution is 
not satisfactory as the same elements might carry differ- 
ent names in different organisms because genomes are 
annotated independently. For instance, the anole hAT- 
HT2_Acar is different from the hAT2_ML of bats but is 
identical to the hAT4 in Xenopus tropicalis. In those 
cases, we believe it is better to not use a numbering 
scheme but instead to choose a different name for those 
families that are found in distantly related taxa. A name 
that reflects at least partially the evolutionary affinities 
of the elements is preferable. The solution adopted in 
Thomas et al. [42] to name horizontally transferred heli- 
trons seems satisfactory, e.g., Heligloria. 

As mentioned earlier, the classification and nomencla- 
ture of transposons is a work in progress that will 
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require a better knowledge of transposable element evo- 


Table 2 Anolis species and proposed abbreviations 


lution below the super-family level and across genomes. 


Anolis species 


Abbreviation 


It is the goal of the committee to regularly improve and 


QCUtUS 


Aacu 


update the classification of A. carolinensis elements. 




Aacn 




aequatoriaiis 


Aaeq 


Abbreviations for Anolis species and population groups 


agossizi 


Aaga 


Comparative and functional genomics is rapidly progres- 


agusroi 


Aagu 


sing from broad-scale comparisons among model sys- 


nhli 

onu 




tems to fine-scale analyses among populations and 


alayoni 


Aala 


closely related species [43-45]. Anolis is an ecologically, 


aifaroi 


Aalf 


physiologically, and morphologically diverse genus of 


aliniger 


A nti 


over 350 species that has a rich history of comparative 


nllisnni 

LaIIIjL/1 11 


Aals 


studies [4]. While the nomenclature described above 


aliogus 




establishes guidelines for the model system, A. caroli- 




Aait 


nensis, it is critical that the research community arrive 


nlfn\/pipn^i^ 

La t LL4 V c l\Ll IJIJ 


Aalv 


at a common vocabulary to reference data from other 


nlfiti ininnlfc 

{AILiLUkJIi I Li 11 J 


Aaln 


Anolis species and among populations. The AGNC pro- 


nil i no iiti/i 
UtUi I III lu 




poses the following guidelines with this aim: 


olutOC6US 


A 

Aalu 




n l\/n rP7ripitnrni 

U I V La I \L£.\j\L1 VKJlLJl 


Aald 


♦ All genus and species abbreviations for anoles will 


amplisquamosus 


Aamp 


begin with the capital letter, ( A', followed by three 


anatoloros 


Aana 


lowercase italicized letters based approximately on 


anchicayae 


Aanc 


the first letters of the species name, e.g., Anolis 


Ui It IIIULjUIUI 


Aanf 


sagrei = Asag. 


QFIQUStlCEpS 


Aang 


♦ In comparative analyses abbreviations will be 


q nisoi^pis 


Aani 


added as a suffix to the proper gene names, e.g., 


onncctsns 


Aann 


gene2-Asag. 


0nt!0CjUI06 


Aano 


♦ The three-letter species abbreviation suffix (in lower- 


ontoni 


Aant 


case) is generated by the first two letters of the species 


opistophoilus 


Aapl 


name and an identifying third letter unique to each 


opollinons 


Aapo 


species. In cases of redundancy in all of the first three 


oquoticus 


Aaqu 


letters of species names, precedence is given to the 


orgenteolus 


Aarg 


date of first publication. For the remaining species, the 


orgiiloceus 


Aan 


third letter will be replaced with the subsequent letter 


armouri 


Aarm 


of the species name that generates a unique code. 


QUfQtUS 


Aauf 


Examples: A. grahami = Agra since this species was 


baccotus 


Abac 


first reported in 1845 [46]; A. gracilipes = Agrc; A. 


bahorucoensis 


noan 


granuliceps = Agrn. A full listing of 378 abbreviations 


r~\n\onfi i c 
UUIiiUlUb 


Abal 


based on our current view of the species content of 


bQfOCOOE 


Abao 


Anolis is found in Table 2 and posted to various anole 


bofohonciG 


Aban 


community sites listed at the end of this report. 


bar bat us 


Abab 


♦ Once established, modifications to the four letter 


barbouri 


Abar 


abbreviations are strongly discouraged in order to 


barker! 




maintain clarity, even in cases of renaming or 


bartschi 


Abal 


reclassification. 


bsckcfi 


Abcc 


♦ This system of nomenclature does not address sub- 


beliipeniculus 


Abel 


species designations or geographic 'races.' The 


bicaorum 


Abie 


AGNC is currently accepting community proposals 


bimaculatus 


Abim 


for these designations. 


binotatus 


Abin 




biporcatus 


Abip 




birama 


Abir 


Abbreviations for conserved sequences 


biscutiger 


Abis 


A subclass of sequences can be defined by their high 


bitectus 


Abit 


degree of conservation across taxonomic levels [47,48]. 


blanquillanus 


Abb 
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boettgeri 


Aboe 


deltae 


Adel 


bombiceps 


Abom 


desechensis 


Ades 


bonairensis 


Abon 


dissimilis 


Adii 


bouvieh 


Abou 


distichus 


Adis 


breedlovei 


Abrd 


dolichocephalus 


Adoi 


bremeri 


Abrm 


dollfusianus 


Adol 


brevirostris 


Abre 


dominicanus 


Adorn 


brunneus 


Abru 


duellmani 


Adue 


calimae 


Acal 


dunni 


Adun 


campbelli 


Acorn 


eewi 


Aeew 


capita 


Acap 


electrum 


Aele 


caquetae 


Acaq 


equestris 


Aequ 


carlostoddi 


Acao 


ernestwilliamsi 


Aern 


carolinensis 


Acar 


etheridgei 


Aeth 


carpenters 


Acae 


eugenegrahami 


Aeug 


casildae 


Acas 


eulaemus 


Aeul 


caudalis 


Acau 


euskalerriari 


Aeus 


centralis 


Acen 


evermanni 


Aeve 


chamaeleonides 


Acha 


extremus 


Aext 


charlesmeyeri 


Ache 


fairchildi 


Afai 


chloris 


Achi 


fasciatus 


Afas 


chlorocyanus 


Achl 


ferreus 


Afer 


chocorum 


Acho 


testae 


Afes 


christophei 


Achs 


fitchi 


Afit 


chrysolepis 


Achr 


forbesi 


Afor 


clivicola 


Acli 


fortunensis 


Afot 


cobanensis 


Acob 


fowler i 


Afow 


coelestinus 


Acoe 


fraseri 


Afra 


compressicauda 


Acorn 


frenatus 


Afre 


concolor 


Aeon 


fugitivus 


Afug 


confusus 


Acof 


fungosus 


Afun 


conspersus 


Acos 


fuscoauratus 


Afus 


cooki 


Acoo 


gadovi 


Agad 


crassulus 


Acra 


garmani 


Agar 


cristatellus 


Acri 


garridoi 


Agai 


cristifer 


Acrs 


gemmosus 


Ager 


cryptolimifrons 


Aery 


gibbiceps 


Agib 


cumingi 


Acum 


gingivinus 


Agin 


cupeyalensis 


Acue 


godmani 


Agod 


cupreus 


Acup 


gorgonae 


Agar 


cuprinus 


Acur 


gracilipes 


Agrc 


cuscoensis 


Acuc 


grahami 


Agra 


cusuco 


Acus 


granuliceps 


Agrn 


cuvieri 


Acuv 


greyi 


Agre 


cyanopleurus 


Acya 


griseus 


Agri 


cybotes 


Acyb 


gruuo 


Agru 


cymbops 


Acym 


guafe 


Aguf 


damulus 


Adam 


guamuhaya 


Agua 


danieli 


Adan 


guazuma 


Aguz 


darlingtoni 


Adar 


gundlachi 


Agun 


datzorum 


Adat 


haetianus 


Ahae 


delafuentei 


Adef 


haguei 


Ahag 
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hendersoni 


Ahen 


macrini 


Aman 


heterodermus 


Ahet 


macrolepis 


Amai 


heteropholidotus 


Ahee 


macrophallus 


Amap 


hobartsmithi 


Ahob 


mocuiigula 


Amau 


homolechis 


Ahom 


maculiventris 


Amac 


huilae 


Ahui 


magnaphallus 


Amag 


humilis 


Ahum 


marcanoi 


Amaa 


ibague 


Alba 


mariarum 


Amar 


ibanezi 


Aibn 


marmoratus 


Amam 


imias 


Aimi 


marron 


Amao 


impetigosus 


Aimp 


marsupialis 


Amas 


incredulus 


Ainc 


matudai 


Amat 


inderenae 


Aind 


maynardi 


Amay 


inexpectata 


Aine 


medemi 


Amed 


insignis 


Ains 


megalopithecus 


Ameg 


insolitus 


Aino 


menta 


Amen 


isolepis 


Also 


meridionalis 


Amer 


isthmicus 


Aist 


mestrei 


Ames 


jacare 


Ajac 


microlepidotus 


Amip 


johnmeyeri 


Ajoh 


microtus 


Amic 


juangundiachi 


Ajua 


milleri 


Amil 


jubar 


Ajub 


mirus 


Amir 


kemptoni 


Akem 


monensis 


Amoe 


koopmoni 


Akoo 


monteverde 


Amot 


kreutzi 


Akre 


monticola 


Amon 


krugi 


Akru 


morozani 


Amor 


kunayalae 


Akun 


muralla 


Amur 


laevis 


Alav 


nasofrontal 


Anas 


laeviventris 


Alae 


naufragus 


Anau 


lamari 


Atom 


neblininus 


Anei 


latifrons 


Aiat 


nebuloides 


Aneu 


leachi 


Alea 


nebulosus 


Aneb 


lemniscatus 


Alen 


nelsoni 


Anel 


lemurinus 


Alem 


nicefori 


Anic 


limifrons 


Alim 


nitens 


Anit 


lineatopus 


Alie 


noblei 


Anob 


lineatus 


Alin 


notopholis 


Anot 


liogaster 


Alig 


nubilis 


Anub 


lionotus 


Alio 


occultus 


Aocc 


litoralis 


Alit 


ocelloscaputoris 


Aoce 


lividus 


Aliv 


ocutotus 


Aocu 


longiceps 


Alon 


olssoni 


Aols 


longitibiaiis 


Alog 


omiltemanus 


Aomi 


loveridgei 


Alov 


onca 


Aonc 


loysianus 


Aloy 


opaiinus 


Aopa 


luciae 


Alua 


ophiolepis 


Aoph 


lucius 


Aluc 


oporinus 


Aopo 


luteogularis 


Alus 


orcesi 


Aorc 


luteosignifer 


Alut 


ortoni 


Aort 


lynchi 


Alyn 


otongae 


Aoto 


lyra 


Alyr 


pachypus 


Apac 


macilentus 


Amai 


paravertebralis 


Apaa 
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parilis 


Apai 


salvini 


Asal 


parvicirculatus 


Apar 


santamartae 


Asan 


paternus 


Apat 


schiedi 


Asch 


pentaprion 


Apen 


schmidti 


Ascm 


peraccae 


Aper 


schwartzi 


Ascw 


petersi 


Apet 


scriptus 


Ascr 


philopunctatus 


Aphi 


scypheus 


Ascy 


phyllorhinus 


Aphy 


semilineatus 


Asem 


pigmaequestris 


Apig 


sericeus 


Aser 


pijolense 


Apij 


serranoi 


Asea 


pinchoti 


Apin 


sheplani 


Ashe 


placidus 


Apia 


shrevei 


Ashr 


poecilopus 


Apoe 


simmonsi 


Asim 


pogus 


Apog 


singularis 


Asin 


polylepis 


Apol 


smallwoodi 


Asml 


polyrhachis 


Apoh 


smaragdinus 


Asma 


poncencis 


Apon 


sminthus 


Asmi 


porcatus 


Apor 


soinii 


Asoi 


porcus 


Apoc 


solitarius 


Asol 


princeps 


Apri 


spectrum 


Aspe 


proboscis 


Apro 


squamulatus 


Asqu 


propinquus 


Aprp 


strahmi 


Asta 


pseudokemptoni 


Apsk 


stratulus 


Astr 


pseudopachypus 


Apsp 


subocularis 


Asub 


pseudotigrinus 


Apse 


sulcifrons 


Asul 


pulchellus 


Apul 


tandai 


Atan 


pumilus 


Apum 


taylori 


At ay 


punctatus 


Apun 


terraealtae 


Ater 


purpurescens 


Apur 


terueli 


Ateu 


purpurgularis 


Apug 


tetarii 


Atet 


pygmaeus 


Apyg 


tigrinus 


Atig 


quadriocellifer 


Aqud 


toldo 


Atod 


quaggulus 


Aqua 


tolimensis 


Atol 


quercorum 


Aque 


townsendi 


Atow 


reconditus 


Arec 


trachyderma 


Atrc 


rejectus 


Are] 


transversalis 


Atra 


rhombifer 


Arho 


trinitatus 


Atri 


richardi 


Arih 


tropidogaster 


Atro 


ricordi 


Aric 


tropidolepis 


Atri 


rimarum 


Arim 


tropidonotus 


Atrp 


rivalis 


Ariv 


umbrivagus 


Aumb 


roatanensis 


Aroa 


uniformis 


Auni 


rodriguezi 


Arod 


unilobatus 


Aunl 


roosevelti 


Aroo 


utilensis 


Auti 


roquet 


Aroq 


utowanae 


Auto 


rubribarbaris 


Arua 


valencienni 


Aval 


rubribarbus 


Arub 


vanidicus 


Avan 


ruibali 


Arul 


vanzolinii 


Avaz 


ruizi 


Ami 


vaupesianus 


Avau 


rupinae 


Arup 


ventrimaculatus 


Aven 


sabanus 


Asab 


vermiculatus 


Aver 


sagrei 


Asag 


vescus 


Aves 
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vicarius Avic 

villai Avil 

vittigerus Avit 

wampuensis Awam 

wattsi Awat 

websteri Aweb 

wellbornae Awel 

wermuthi Awer 

whitemani Awhi 

williamsi Awil 

williamsmittermeierorum Awim 

woodi Awoo 

yoroensis Ayor 

zeus Azeu 



Nomenclature for these conserved sequences (CSs) 
poses unique challenges because they lack defining 
content, such as that comprising transposons and repe- 
titive elements. Additionally, CSs are not always com- 
pletely conserved and occasional duplicate CSs are 
scattered throughout the genome. We propose to 
describe CSs in the Anolis genome using a combina- 
tion of species code, unique identification number, 
length, percent conservation with other species, and 
characterization of species with which they are shared 
[49]. We recommend that: 

♦ CS names begin with the species code, Acar, to 
identify Anolis carolinensis as the species within 
which these sequences are described. 

♦ A unique, 1-indexed, arbitrarily assigned number 
follow the species name. 

♦ Abbreviated length class designations follow the 
CS number. We define the length classes as follows: 
(s) short < 99 bp; (m) medium 100-499 bp; or (1) 
long >500 bp). 

♦ A numeral representing percent conservation to 
the reference species ((1) 100-95%; (2) 94-90%; or 
(3) 89-85%) follows the length class designation. 

♦ CS names end with an abbreviated indicator of the 
taxonomic span of conservation: (S) shared among 
Sauropsida, (M) shared among Mammalia, (B) 
shared among Batrachia, and (G) shared among 
Gymnophiona. 

Using this nomenclature, the 1,000th CS identified in 
the A. carolinensis genome that is 600 bp long having 
100% conservation between A. carolinensis and chicken 
genomes would be named AcarlOOOllSMB. 



Abbreviations for Anolis genetic markers including 
microsatellite assays 

The A. carolinensis genome contains many types of 
repetitive elements including mononucleotide tracts, 
microsatellites, minisatellites, and satellites. Many 
researchers focus on simple tandem repeats (STRs, also 
known as short tandem repeats, microsatellites or sim- 
ple sequence repeats, SSRs). Some STRs have variable 
numbers of repeats (i.e., variable number tandem 
repeats, VNTRs). However, variation is often not 
reported with the genomic sequence and may be incon- 
sistent among populations and species, and knowledge 
of variation can change through time as more indivi- 
duals are sampled. Rather than subdividing and expli- 
citly defining the different repeat types or using VNTR 
status, we provide a simple, unique nomenclature that 
can be applied to all STRs in any species of Anolis. This 
nomenclature is linked to a more descriptive, locus-spe- 
cific annotation available from lizardbase. Additional 
detail regarding the challenges of explicitly defining var- 
ious classes of STRs has been described [50]. 

We propose that Anolis STRs be assigned a name 
consisting of three fields separated by underscores: 

1) the species code described in Part 4 above derived 
from the organism of origin, 

2) the letters 'str' for simple tandem repeat, and 

3) a unique, 1-indexed, identification number 

Using this nomenclature, the 8 th STR identified in the 
A. carolinensis genome would be coded as Acar_str_8. 
We will store additional, locus-specific information such 
as repeat unit, genomic location, and number of repeats 
in a separate database, linked to each STR using these 
unique names. The submission of STR markers and 
assignment of unique identification numbers will be 
handled through lizardbase by the AGNC or designated 
member. 

Conclusions 

Future objectives of the Anolis Gene Nomenclature 
Committee 

The recently published green anole (A. carolinensis) 
genome [1] provides an example of how a community 
of researchers with both common and distinct interests 
can work together to build an enduring resource. This 
genomics resource now provides an opportunity for the 
community to advance a greater knowledge of gene 
function and orthology. As work progresses on Anolis 
species genomes, new and unforeseen nomenclature 
issues will certainly arise. The goal of the AGNC is to 
foster community-based discussion where these pro- 
blems can be resolved. We have presented guidelines for 
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three immediate objectives for the AGNC but we fore- 
see the need to rapidly address the following objectives: 

♦ Nomenclature for populations and treatment of 
geographic variation 

♦ Creating a common nomenclature for genetic mar- 
kers such as microsatellites and SNPs 

♦ Creating a common nomenclature for transposable 
elements 

The AGNC welcomes feedback from the community 
to raise overlooked issues and unforeseen conflicts. The 
AGNC views these recommendations as an evolving 
document, and current, archival, and proposed revisions 
will be posted to the anole community web sites: 

lizardbase [28] 
Anolisgenome [51] 
Anolis Newsletter [52] 
Anole Annals Blog [53] 

Correspondence to any member of the committee is 
welcomed. We also would like to elicit comments and 
suggestions from other research communities with 
unannotated genomes. It would be helpful to be able to 
develop and share such important resources and experi- 
ences together. 

List of abbreviations used 

AGNC: Anolis Gene Nomenclature Committee; BAC: bacterial artificial 
chromosome; ECC: evolutionary character code; CS: conserved sequence; 
GO: Gene Ontology; HCOP: Human Gene Nomenclature Committee 
Comparison of Orthology Predictions; HUGO: Human Genome Organization; 
mya: million years ago; OMA: Orthologs Matrix Project; UCSC: University of 
California: Santa Cruz; STR: short tandem repeat; VNTR: variable number 
tandem repeat; ZFIN: Zebrafish Information Network. 
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